Search CORE

25 research outputs found

Reliable and Real-Time Distributed Abstractions

Author: Kozhaya David
Publication venue: Lausanne, EPFL
Publication date: 05/12/2016
Field of study

The celebrated distributed computing approach for building systems and services using multiple machines continues to expand to new domains. Computation devices nowadays have additional sensing and communication capabilities, while becoming, at the same time, cheaper, faster and more pervasive. Consequently, areas like industrial control, smart grids and sensor networks are increasingly using such devices to control and coordinate system operations. However, compared to classic distributed systems, such real-world physical systems have different needs, e.g., real-time and energy efficiency requirements. Moreover, constraints that govern communication are also different. Networks become susceptible to inevitable random losses, especially when utilizing wireless and power line communication. This thesis investigates how to build various fundamental distributed computing abstractions (services) given the limitations, the performance and the application requirements and constraints of real-world control, smart grid and sensor systems. In quest of completeness, we discuss four distributed abstractions starting from the level of network links all the way up to the application level. At the link level, we show how to build an energy-efficient reliable communication service. This is especially important for devices with battery-powered wireless adapters where recharging might be unfeasible. We establish transmission policies that can be used by processes to decide when to transmit over the network in order to avoid losses and minimize re-transmissions. These policies allow messages to be reliably transmitted with minimum transmission energy. One level higher than links is failure detection, a software abstraction that relies on communication for identifying process crashes. We prove impossibility results concerning implementing classic eventual failure detectors in networks with probabilistic losses. We define a new implementable type of failure detectors, which preserves modularity. This means that existing deterministic algorithms using eventual failure detectors can still be used to solve certain distributed problems in lossy networks: we simply replace the existing failure detector with the one we define. Using failure detectors, processes might get information about failures at different times. However, to ensure dependability, environments such as distributed control systems (DCSs), require a membership service where processes agree about failures in real time. We prove that the necessary properties of this membership cannot be implemented deterministically, given probabilistic losses. We propose an algorithm that satisfies these properties, with high probability. We show analytically, as well as experimentally (within an industrial DCS), that our technique significantly enhances the DCS dependability, compared to classic membership services, at low additional cost. Finally, we investigate a real-time shared memory abstraction, which vastly simplifies programming control applications. We study the feasibility of implementing such an abstraction within DCSs, showing the impossibility of this task using traditional algorithms that are built on top of existing software blocks like failure detectors. We propose an approach that circumvents this impossibility by attaching information to the failure detection messages, analyze the performance of our technique and showcase ways of adapting it to various application needs and workloads

Infoscience - École polytechnique fédérale de Lausanne

You Only Live Multiple Times: A Blackbox Solution for Reusing Crash-Stop Algorithms In Realistic Crash-Recovery Settings

Author: Kozhaya David
Maric Ognjen
Pignolet Yvonne-Anne
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 22nd International Conference on Principles of Distributed Systems (OPODIS 2018)
Publication date: 01/01/2018
Field of study

Distributed agreement-based algorithms are often specified in a crash-stop asynchronous model augmented by Chandra and Toueg\u27s unreliable failure detectors. In such models, correct nodes stay up forever, incorrect nodes eventually crash and remain down forever, and failure detectors behave correctly forever eventually, However, in reality, nodes as well as communication links both crash and recover without deterministic guarantees to remain in some state forever. In this paper, we capture this realistic temporary and probabilitic behaviour in a simple new system model. Moreover, we identify a large algorithm class for which we devise a property-preserving transformation. Using this transformation, many algorithms written for the asynchronous crash-stop model run correctly and unchanged in real systems

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

RT-ByzCast: Byzantine-Resilient Real-Time Reliable Broadcast

Author: Decouchant Jérémie
Kozhaya David
Verissimo Paulo
Publication venue
Publication date: 01/03/2019
Field of study

Today’s cyber-physical systems face various impediments to achieving their intended goals, namely, communication uncertainties and faults, relative to the increased integration of networked and wireless devices, hinder the synchronism needed to meet real-time deadlines. Moreover, being critical, these systems are also exposed to significant security threats. This threat combination increases the risk of physical damage. This paper addresses these problems by studying how to build the first real-time Byzantine reliable broadcast protocol (RTBRB) tolerating network uncertainties, faults, and attacks. Previous literature describes either real-time reliable broadcast protocols, or asynchronous (non real-time) Byzantine ones. We first prove that it is impossible to implement RTBRB using traditional distributed computing paradigms, e.g., where the error/failure detection mechanisms of processes are decoupled from the broadcast algorithm itself, even with the help of the most powerful failure detectors. We circumvent this impossibility by proposing RT-ByzCast, an algorithm based on aggregating digital signatures in a sliding time-window and on empowering processes with self-crashing capabilities to mask and bound losses. We show that RT-ByzCast (i) operates in real-time by proving that messages broadcast by correct processes are delivered within a known bounded delay, and (ii) is reliable by demonstrating that correct processes using our algorithm crash themselves with a negligible probability, even with message loss rates as high as 60%

Open Repository and Bibliography - Luxembourg

Qualitative Analysis for Validating IEC 62443-4-2 Requirements in DevSecOps

Author: Göttel Christian
Kabir-Querrec Maëlle
Kozhaya David
Sivanthi Thanikesavan
Vuković Ognjen
Publication venue
Publication date: 23/10/2023
Field of study

Validation of conformance to cybersecurity standards for industrial automation and control systems is an expensive and time consuming process which can delay the time to market. It is therefore crucial to introduce conformance validation stages into the continuous integration/continuous delivery pipeline of products. However, designing such conformance validation in an automated fashion is a highly non-trivial task that requires expert knowledge and depends upon the available security tools, ease of integration into the DevOps pipeline, as well as support for IT and OT interfaces and protocols. This paper addresses the aforementioned problem focusing on the automated validation of ISA/IEC 62443-4-2 standard component requirements. We present an extensive qualitative analysis of the standard requirements and the current tooling landscape to perform validation. Our analysis demonstrates the coverage established by the currently available tools and sheds light on current gaps to achieve full automation and coverage. Furthermore, we showcase for every component requirement where in the CI/CD pipeline stage it is recommended to test it and the tools to do so

arXiv.org e-Print Archive

Facing the Safety-Security Gap in RTES: the Challenge of Timeliness

Author: Kozhaya David
Verissimo Paulo
Volp Marcus
Publication venue
Publication date: 01/12/2017
Field of study

Safety-critical real-time systems, including real-time cyber-physical and industrial control systems, need not be solely correct but also timely. Untimely (stale) results may have severe consequences that could render the control system’s behaviour hazardous to the physical world. To ensure predictability and timeliness, developers follow a rigorous process, which essentially ensures real-time properties a priori, in all but the most unlikely combinations of circumstances. However, we have seen the complexity of both real-time applications, and the environments they run on, increase. If this is matched with the also increasing sophistication of attacks mounted to RTES systems, the case for ensuring both safety and security through aprioristic predictability loses traction, and presents an opportunity, which we take in this paper, for discussing current practices of critical realtime system design. To this end, with a slant on low-level task scheduling, we first investigate the challenges and opportunities for anticipating successful attacks on real-time systems. Then, we propose ways for adapting traditional fault- and intrusiontolerant mechanisms to tolerate such hazards. We found that tasks which typically execute as analyzed under accidental faults, may exhibit fundamentally different behavior when compromised by malicious attacks, even with interference enforcement in place

Open Repository and Bibliography - Luxembourg

Right On Time Distributed Shared Memory

Author: Guerraoui Rachid
Kozhaya David
Pignolet-Oswald Yvonne-Anne
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/12/2016
Field of study

The demand for real-time data storage in distributed control systems (DCSs) is growing. Yet, providing real- time DCS guarantees is challenging, especially when more and more sensor and actuator devices are connected to industrial plants and message loss needs to be taken into account. In this paper, we investigate how to build a shared memory abstraction for DCSs as a first step towards implementing different shared storage systems in a DCS context. We first prove that, in the presence of host crashes and message losses, the necessary guarantees of such an abstraction are impossible to implement using a traditional approach that has no access to the internals of existing DCS services, e.g., a modular approach where algorithms are built on top of existing software blocks like failure detectors. We propose a white-box approach that utilizes messages of existing services in any DCS as the sole means of communication. More precisely, we present TapeWorm, an algorithm that attaches itself to the heartbeat messages of the failure detector component in DCSs. We prove that TapeWorm implements the desired shared memory guarantees for applications running on a DCS. We also analyze the performance of TapeWorm and we showcase ways of adapting TapeWorm to various application needs and workloads

Infoscience - École polytechnique fédérale de Lausanne

Crossref

RepuCoin: Your Reputation is Your Power

Author: David Kozhaya
Jeremie Decouchant
Jiangshan Yu
Paulo Esteves-Verissimo
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 03/02/2019
Field of study

Existing proof-of-work cryptocurrencies cannot tolerate attackers controlling more than 50% of the network’s computing power at any time, but assume that such a condition happening is “unlikely”. However, recent attack sophistication, e.g., where attackers can rent mining capacity to obtain a majority of computing power temporarily, render this assumption unrealistic. This paper proposes RepuCoin, the first system to provide guarantees even when more than 50% of the system’s computing power is temporarily dominated by an attacker. RepuCoin physically limits the rate of voting power growth of the entire system. In particular, RepuCoin defines a miner’s power by its ‘reputation’, as a function of its work integrated over the time of the entire blockchain, rather than through instantaneous computing power, which can be obtained relatively quickly and/or temporarily. As an example, after a single year of operation, RepuCoin can tolerate attacks compromising 51% of the network’s computing resources, even if such power stays maliciously seized for almost a whole year. Moreover, RepuCoin provides better resilience to known attacks, compared to existing proof-of-work systems, while achieving a high throughput of 10000 transactions per second (TPS)

Cryptology ePrint Archive

Byzantine Resilient Protocol for the IoT

Author: Fröhlich Antônio Augusto
Kozhaya David
Scheffel M.Roberto
Verissimo Paulo
Publication venue
Publication date: 19/09/2018
Field of study

Wireless sensor networks, often adhering to a single gateway architecture, constitute the communication backbone for many modern cyber-physical systems. Consequently, faulttolerance in CPS becomes a challenging task, especially when accounting for failures (potentially malicious) that incapacitate the gateway or disrupt the nodes-gateway communication, not to mention the energy, timeliness, and security constraints demanded by CPS domains. This paper aims at ameliorating the fault-tolerance of WSN based CPS to increase system and data availability. To this end, we propose a replicated gateway architecture augmented with energy-efficient real-time Byzantineresilient data communication protocols. At the sensors level, we introduce FT-TSTP, a geographic routing protocol capable of delivering messages in an energy-efficient and timely manner to multiple gateways, even in the presence of voids caused by faulty and malicious sensor nodes. At the gateway level, we propose a multi-gateway synchronization protocol, which we call ByzCast, that delivers timely correct data to CPS applications, despite the failure or maliciousness of a number of gateways. We show, through extensive simulations, that our protocols provide better system robustness yielding an increased system and data availability while meeting CPS energy, timeliness, and security demands

Open Repository and Bibliography - Luxembourg

RepuCoin: Your Reputation is Your Power

Author: Decouchant Jérémie
Kozhaya David
Verissimo Paulo
Yu Jiangshan
Publication venue
Publication date: 01/01/2019
Field of study

Open Repository and Bibliography - Luxembourg